Supplement for : Domain prediction with proba - bilistic directional context

نویسندگان

  • Alejandro Ochoa
  • Mona Singh
چکیده

Pfam 30 (16,306 HMMs) provides the PfamA.full.uniprot file that corresponds to UniProt 2016_02 (46,974,580 proteins). This file was used to obtain dPUC2’s observed family pair counts, CODD’s list of certified domain pairs [1], and DAMA’s domain information and observed architectures [2]. We used the HMMER 3.1b2 version of hmmscan to predict domains (this version is required by Pfam 30). We downloaded the newest UniRef50 version (dated 2017-02-27, 20,905,476 proteins) and randomly selected a subset of 1,000,000 proteins to use in the Pfam 30 version of the RevSeq FDR test, which is otherwise as before [3]. All domain prediction methods were run as described in the main text.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Inside-Outside Estimation of a Lexicalized PCFG for German

The paper describes an extensive experiment in inside-outside estimation of a lexicalized proba-bilistic context free grammar for German verb-final clauses. Grammar and formalism features which make the experiment feasible are described. Successive models are evaluated on precision and recall of phrase markup.

متن کامل

A New Method for Company Failure Prediction Using Proba- bilistic Neural Networks

This paper presents a new method for company failure prediction using probabilistic neural networks. The method extracts templates through a supervised learning. Each template represents the companies having similar financial performance. A comparison between a company and a template can find out some financial problems occurring to a company and an early warning can be given if necessary. The ...

متن کامل

Wavelet - Based Statistical Signal Processing

Wavelet-based statistical signal processing techniques such as denoising and detection typically model the wavelet coeecients as independent or jointly Gaussian. These models are unrealistic for many real-world signals. In this paper, we develop a new framework based on wavelet-domain hidden Markov models (HMMs). The framework enables us to concisely model the statistical dependencies and nonGa...

متن کامل

Influence of Changing Background on Chris/proba Data over an Heterogeneous Canopy

The spaceborne ESA-mission CHRIS-Proba (Compact High Resolution Imaging SpectrometerProject for On-Board Autonomy) provides hyperspectral and multi-directional data of selected targets spread over the world. While the spectral information content of CHRIS/Proba data is able to assess the biochemistry of a vegetation canopy, the directional information can describe the structure of an observed c...

متن کامل

Inducing probabilistic invertible translation grammars from aligned texts

This paper presents an algorithm for extracting invertible proba-bilistic translation grammars from bilingual aligned and linguistically bracketed text. The invertibility condition requires all translation ambiguities to be resolved in the-nal translation grammar. The paper examines the complexity of inducing translation grammars and proposes a number of heuristics to reduce the the theoretical...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017